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Abstract 

We present an efficient computational approach to perform real-space electronic struc- 
ture calculations using an adaptive higher-order finite-element discretization of Kohn- 
Sham density- functional theory (DFT). To this end, we develop an a priori mesh adaption 
technique to construct a close to optimal finite-element discretization of the problem. We 
further propose an efficient solution strategy for solving the discrete eigenvalue problem 
by using spectral finite-elements in conjunction with Gauss-Lobatto quadrature, and a 
Chebyshev acceleration technique for computing the occupied eigenspace. Using the pro- 
posed solution procedure, we investigate the computational efficiency afforded by higher- 
order finite-element discretizations of the Kohn-Sham DFT problem. Our studies suggest 
that staggering computational savings — of the order of 1000— fold — can be realized, for 
both all-electron and pseudopotential calculations, by using higher-order finite-element 
discretizations. On all the benchmark systems studied, we observe diminishing returns in 
computational savings beyond the sixth-order for accuracies commensurate with chemi- 
cal accuracy, suggesting that the hexic spectral-element may be an optimal choice for the 
finite-element discretization of the Kohn-Sham DFT problem. A comparative study of 
the computational efficiency of the proposed higher-order finite-element discretizations 
suggests that the performance of finite-element basis is competing with the plane-wave 
discretization for non-periodic pseudopotential calculations, and is comparable to the 
Gaussian basis for all-electron calculations. Further, we demonstrate the capability of 
the proposed approach to compute the electronic structure of materials systems contain- 
ing a few thousand atoms using modest computational resources, and good scalability of 
the present implementation up to a few hundred processors. 

* Corresponding author 
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1 Introduction 

Electronic structure calculations have played a significant role in the investigation of ma- 
terials properties over the past few decades. In particular, the Kohn-Sham approach to 
density functional theory (DFT) [T] has made quantum-mechanically informed calcula- 
tions on ground-state materials properties computationally tractable, and has provided 
many important insights into a wide range of materials properties. The Kohn-Sham 
approach is based on the key result of Hohenberg &: Kohn [2] that the ground-state prop- 
erties of a materials system can be described by a functional of electron density. Though, 
the existence of an energy functional has been established by the Hohenberg-Kohn result, 
its functional form is not known to date. The work of Kohn & Sham [1] addressed this 
challenge in an approximate sense, and has laid the foundations for the practical appli- 
cation of DFT to materials systems. The Kohn-Sham approach reduces the many-body 
problem of interacting electrons into an equivalent problem of non-interacting electrons 
in an effective mean field that is governed by the electron density. This effective single- 
electron description is exact in principle for ground-state properties, but is formulated in 
terms of an unknown exchange-correlation term that includes the quantum-mechanical 
interactions between electrons. This exchange-correlation term is approximated using 
various models — commonly modeled as an explicit functional of electron density — and 
these models have been shown to predict a wide range of materials properties across 
various materials systems. We note that the development of increasingly accurate and 
computationally tractable exchange-correlation functionals is an active research area in 
electronic structure calculations. Though the Kohn-Sham approach greatly reduces the 
computational complexity of the original many-body Schrodinger problem, simulations 
of large-scale material systems with DFT are still computationally very demanding. Nu- 
merical algorithms which are robust, computationally efficient and scalable on parallel 
computing platforms are always desirable to enable DFT calculations at larger length 
and time scales, and on more complex systems, than possible heretofore. 

The plane-wave basis has traditionally been one of the popular basis sets used for 
solving the Kohn-Sham problem O HI [5]. The plane- wave basis allows for an efficient 
computation of the electrostatic interactions that are extended in real-space through 
Fourier transforms. However, the plane-wave basis also has some notable disadvantages. 
In particular, calculations are restricted to periodic geometries that are incompatible 
with most realistic systems containing defects (for e.g. dislocations). Further, the plane- 
wave basis provides a uniform spatial resolution which can be inefficient in the treat- 
ment of non-periodic systems like molecules, nano-clusters etc., or materials systems 
with defects, where higher basis resolution is often required in some spatial regions and 
a coarser resolution suffices elsewhere. Moreover, the plane-wave basis is non-local in 
real space, which significantly affects the scalability of computations on parallel com- 
puting platforms. Thus, the development of real-space techniques for electronic struc- 
ture calculations has received significant attention over the past decade, and we refer 
to O El El [H [ini EH E2I EH] and references therein for a comprehensive overview. Among 
the real-space techniques, the finite-element basis presents some key advantages — it is 
amenable to unstructured coarse-graining, allows for consideration of complex geome- 
tries and boundary conditions, and is scalable on parallel computing platforms. We refer 
to[IlEaE6lEZlESlEa[20l[2ll[22l[23l[Ml[2a[2Sl references therein, for a 
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comprehensive overview of the past efforts in developing real-space electronic structure 
calculations based on a finite-element discretization. 

While the finite-element basis is more versatile than the plane- wave basis [T8] , it is 
not without its shortcomings. Prior investigations have shown that linear finite-elements 
require a large number of basis functions — of the order of 100, 000 basis functions per 
atom — to achieve chemical accuracy in electronic structure calculations (cf. e.g. 122^129] ). 
and this compares very poorly with plane-wave basis or other real-space basis functions. 
It has been demonstrated that higher-order finite-element discretizations can alleviate 
this degree of freedom disadvantage of linear finite-elements in electronic structure cal- 
culations [231 |29l [30] . However, the use of higher-order elements increases the per basis- 
function computational cost due to the need for higher-order accurate numerical quadra- 
ture rules. Furthermore, the bandwidth of the matrix increases cubically with the order 
of the finite-element, which in turn increases the computational cost of matrix- vector 
products. In addition, since a finite-element basis is non-orthogonal, the discretization of 
the Kohn-Sham DFT problem results in a generalized eigenvalue problem, which is more 
expensive to solve in comparison to a standard eigenvalue problem resulting from using 
an orthogonal basis (for e.g. plane- wave basis). Thus, the computational efficiency af- 
forded by using a finite-element basis in electronic structure calculations, and its relative 
performance compared to plane-wave basis and other real-space basis functions (for e.g 
Gaussian basis), has remained an open question to date. 

A recent investigation in the context of orbital-free DFT has indicated that the use of 
higher-order finite-elements can significantly improve the computational efficiency of the 
calculations |31j . For instance, a 100 — 1000 fold computational advantage has been re- 
ported by using a fourth-order finite-element in comparison to a linear finite-element. In 
the present work, we extend this investigation to study the Kohn-Sham DFT problem and 
attempt to establish the computational efficiency afforded by higher-order finite-element 
discretizations in electronic structure calculations. To this end, we develop: (i) an a 
priori mesh adaption technique to construct a close to optimal finite-element discretiza- 
tion of the problem; (ii) an efficient solution strategy for solving the discrete eigenvalue 
problem by using spectral finite-elements in conjunction with Gauss-Lobatto quadrature, 
and a Chebyshev acceleration technique for computing the occupied eigenspace. We sub- 
sequently study the numerical aspects of the finite-element discretization of the formu- 
lation, investigate the computational efficiency afforded by higher-order finite-elements, 
and compare the performance of the finite-element basis with plane- wave and Gaussian 
basis on benchmark problems. 

The a priori mesh adaption technique proposed in this work is based on the ideas 
in [321 133] . and closely follows the recent development of the mesh adaption technique 
for orbital- free DFT [31] . We refer to [26[ [27] for recently proposed a posteriori mesh 
adaption techniques in electronic structure calculations. The mesh adaption technique 
proposed in the present work is based on minimizing the discretization error in the ground- 
state energy, subject to a fixed number of elements in the finite-element mesh. To this 
end, we first develop an estimate for the finite-element discretization error in the Kohn- 
Sham ground-state energy as a function of the characteristic mesh-size distribution, h{r), 
and the exact ground-state electronic fields comprising of wavefunctions and electrostatic 
potential. We subsequently determine the optimal mesh distribution by determining the 
h{r) that minimizes the discretization error. The resulting expressions for the optimal 
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mesh distribution are in terms of the degree of the interpolating polynomial and the 
exact solution fields of the Kohn-Sham DFT problem. Since the exact solution fields are 
a priori unknown, we use the asymptotic behavior of the atomic wavefunctions [M] away 
from the nuclei to determine the coarse-graining rates for the finite-element meshes used 
in our numerical study. Though the resulting finite-element meshes are not necessarily 
optimal near the vicinity of the nuclei, the mesh coarsening rate away from the nuclei 
provides an efficient way of resolving the vacuum in non-periodic calculations. 

We next implement an efficient solution strategy for solving the finite-element dis- 
cretized eigenvalue problem, which is crucial before assessing the computational efficiency 
of the basis. We note that the non-orthogonality of the finite-element basis results in a 
discrete generalized eigenvalue problem, which is computationally more expensive than 
the standard eigenvalue problem that results from using an orthogonal basis like plane- 
waves. We address this issue by employing a spectral finite-element discretization and 
Gauss-Lobatto quadrature rules to evaluate the integrals which results in a diagonal 
overlap matrix, and allows for a trivial transformation to a standard eigenvalue problem. 
Further, we use the Chebyshev acceleration technique for standard eigenvalue problems to 
efficiently compute the occupied eigenspace (cf. e.g. [35] in the context of electronic struc- 
ture calculations). Our investigations suggest that the use of spectral finite-elements and 
Gauss-Lobatto rules in conjunction with Chebyshev acceleration techniques to compute 
the eigenspace gives a 10 — 20 fold computational advantage, even for modest materials 
system sizes, in comparison to traditional methods of solving the standard eigenvalue 
problem where the eigenvectors are computed explicitly. Further, the proposed approach 
has been observed to provide a staggering 100 — 200 fold computational advantage over 
the solution of a generalized eigenvalue problem that does not take advantage of the 
spectral finite-element discretization and Gauss-Lobatto quadrature rules. In our imple- 
mentation, we use a self-consistent field (SCF) iteration with Anderson mixing |36j . and 
employ the finite-temperature Fermi-Dirac smearing [3] to suppress the charge sloshing 
associated with degenerate or close to degenerate eigenstates around the Fermi energy. 

We next study various numerical aspects of the finite-element discretization of the 
Kohn-Sham DFT problem. We begin our investigation by examining the numerical rates 
of convergence of higher-order finite-element discretizations of Kohn-Sham DFT. We 
remark here that optimal rates of convergence have been demonstrated for quadratic 
hexahedral and cubic serendepity elements in pseudopotential Kohn-Sham DFT calcula- 
tions [20\ [26], and mathematically proved for Kohn-Sham DFT for the case of smooth 
pseudopotential external fields [37j. We also note that there have been several works on 
the mathematical analysis of optimal rates of convergence for non-linear eigenvalue prob- 
lems \38 \ 139 ^ 140]. However, the mathematical analysis of optimal rates of convergence of 
the finite-element discretization of Kohn-Sham DFT problem involving Coulomb-singular 
potentials is an open question to date, to the best of our knowledge. In the present 
study, we compute the rates of convergence of the finite-element discretization of Kohn- 
Sham DFT for a range of finite-elements including linear tetrahedral element, hexahedral 
spectral-elements of order two, four and six. Two sets of benchmark problems are con- 
sidered in this study: (i) all-electron calculations on boron atom and methane molecule; 
(ii) pseudopotential calculations on a non-periodic barium cluster consisting of 2 x 2 x 2 
body-centered cubic (BCC) unit cells and a periodic face-centered cubic (FCC) calcium 
crystal. In these benchmark studies, as well as those to follow, the proposed a priori 
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mesh adaption scheme is used to construct the meshes. These studies show rates of con- 
vergence in energy of 0{h?^) for a finite-element whose degree of interpolation is k, which 
denote optimal rates of convergence as demonstrated in [201 EZ] ■ An interesting aspect of 
this study is that optimal rates of convergence have been observed even for all-electron 
calculations involving Coulomb-singular potentials, which, to the best of our knowledge, 
have not been analyzed or reported heretofore. 

We finally turn towards assessing the computational efficiency afforded by higher- 
order finite-element discretizations in Kohn-Sham DFT calculations. To this end, we use 
the same benchmark problems and measure the CPU-time for the solution of the Kohn- 
Sham DFT problem to various relative accuracies for all the finite-elements considered 
in the present study. We observe that higher-order elements can provide a significant 
computational advantage in the regime of chemical accuracy. The computational savings 
observed are about 1000-fold upon using higher-order elements in comparison with a 
linear finite-element for both all-electron as well as pseudopotential calculations. We 
observe that a point of diminishing returns is reached at the sixth-order for accuracies 
commensurate with the chemical accuracy, where the degree of freedom advantage of 
higher-order finite-elements is nullified by the increasing per basis- function costs. To 
further assess the effectiveness of higher-order finite-elements, we conduct pseudopotential 
calculations on large aluminium clusters ranging from 3x3x3 to 5x5x5 FCC unit 
cells using the hexic spectral finite-element, and compare the computational times with 
that of plane- wave basis discretization using ABINIT package [5]. We note that similar 
relative accuracies in the ground-state energies are achieved using the hexic finite-element 
with a lower computational cost in comparison to the plane-wave basis. Furthermore, 
we computed the electronic structure of an aluminum cluster of 7 x 7 x 7 FCC unit 
cells, containing 1689 atoms, with the finite-element basis using modest computational 
resources, which could not be simulated in ABINIT due to huge memory requirements. 

We also investigate the efficiency of higher-order elements in the case of all-electron 
calculations on a larger materials system and compare it with the Gaussian basis using 
the GAUSSIAN package [51]. In this case, the benchmark system is a graphene sheet 
containing 100 atoms. We find that the solution time using the finite-element basis is 
larger by a factor of 10 in comparison to Gaussian basis, and we attribute this differ- 
ence to the highly optimized Gaussian basis functions for C-C bonds that resulted in 
the far fewer basis functions required to achieve chemical accuracy. While this difference 
in the performance can be offset by the better scalability of finite-element discretiza- 
tion on parallel computing platforms, there is also much room for further development 
and optimization in the finite-element discretization of the Kohn-Sham DFT problem. 
For instance, especially in the context of all-electron calculations, the partitions-of-unity 
finite-element method with atomic-orbital enrichment functions can significantly reduce 
the required number of finite-element basis functions as recently demonstrated in j42] . 
and presents an important direction for further investigations. Finally, we assess the 
parallel scalability of our numerical implementation. We demonstrate the strong scaling 
up to 192 processors (limited by our access to computing resources) with an efficiency 
of 91.4% using a 172 atom aluminium cluster discretized with 3.91 million degrees of 
freedom as our benchmark system. 

The remainder of the paper is organized as follows. Section [2] describes the vari- 
ational formulation of the Kohn-Sham DFT problem followed by a discussion on the 
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discrete Kohn-Sham DFT eigenvalue problem. Section [3] develops the error estimates for 
the finite-element discretization of Kohn-Sham DFT, and uses these estimates to present 
an a priori mesh adaption scheme. Section S] describes our numerical implementation 
of the self-consistent field iteration of the Kohn-Sham eigenvalue problem, and, in par- 
ticular, discusses the efficient methodologies developed to solve the Kohn-Sham DFT 
problem using the finite-element basis. Section [5] presents a comprehensive numerical 
study demonstrating the computational efficiency afforded by higher-order finite-element 
discretizations in electronic structure calculations, and also provides a performance com- 
parison of finite-element basis with plane-wave and Gaussian basis. We finally conclude 
with a short discussion and outlook in Section [6l 

2 Formulation 

In this section, we describe the Kohn-Sham DFT energy functional and the associated 
variational formulation. We subsequently review the equivalent self-consistent formula- 
tion of the Kohn-Sham eigenvalue problem, and present the discretization of the formu- 
lation using a finite-element basis. 

2.1 Kohn-Sham variational problem 

We consider a material system consisting of electrons and M nuclei. The spinless 
Kohn-Sham energy functional describing the N electron system is given by [43 ^ [44] 

E{^, R) = r,(*) + EM + Eh{p) + Eecctip, R) + E,,iR), (1) 

where 

N 

p{r)=Y,m^)\' (2) 

i=l 

represents the electron density. In the above expression, we denote the spatial coordi- 
nate by r, whereas x = (r, s) includes both the spatial and spin degrees of freedom. 
We denote hy ^ = {V'i(x), 'i/'2(x), • • • ,'0Af(x)} the vector of orthonormal single electron 
wavefunctions, where each wavefunction -0, € X x {a, f3} can in general be complex- 
valued, and comprises of a spatial part belonging to a suitable function space X (elab- 
orated subsequently) and a spin state denoted by a{s) or /3(s). We further denote by 
R = {Ri, R2, ■ ■ ■ Rm} the collection of all nuclear positions. The first term in the 
Kohn-Sham energy functional in ([1]), Ts{^), denotes the kinetic energy of non- interacting 
electrons and is given by 

T,(*) = J2[ V'r(x) (-^vA M^)d^ , (3) 

i=l 

where ip* denotes the complex conjugate of ipi. E^c in the energy functional denotes the 
exchange-correlation energy which includes the quantum-mechanical many body inter- 
actions. In the present work, we model the exchange-correlation energy using the local 
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density approximation (LDA) |45^ 06] represented as 

Excip) = J exc(/9)p(r) dr , (4) 
where ea;c(yo) = £x{p) + £c{p), and 



£c{p) 



4 V^ry 

> 1, 



AlogTs + B + C rslogVs + Dts < 1, 



and rg = (3/47rp)^/^. Specifically, we use the Ceperley and Alder constants as given 
in [l6]. We remark that we have restricted the present formulation and study to LDA 
exchange-correlation functionals solely for the sake of clarity, and the formulation can 
be trivially extended (cf. e.g. [Mj) to local spin density approximation (LSDA) and 
generalized gradient approximation (GGA) exchange-correlation functionals. 

The electrostatic interaction energies in the Kohn-Sham energy functional in ([T]) are 
given by 

Eextip, R) = j p{r)Vext{v, R)dv = Y, j P(r)K7(r, Rj) dr , (8) 

where Eh is the Hartree energy representing the classical electrostatic interaction energy 
between electrons, Eext is the interaction energy between electrons and the external 
potential induced by the nuclear charges given by Vext = Vj(r, Rj) with Vj denoting 
the potential (singular Coulomb potential or pseudopotential) contribution from the J*'* 
nucleus, and E^z denotes the repulsive energy between nuclei with Zj denoting the charge 
on the nucleus. We note that in a non-periodic setting, representing a finite atomic 
system, all the integrals in equations ©-([H]) are over M"^ and the summations in (I8l)-(l9|) 
include all the atoms / and J in the system. In the case of an infinite periodic crystal, all 
the integrals over r in equations ([S])-® extend over the unit cell, whereas the integrals 
over r' extend in M'^. Similarly, in ([H])-® the summation over / is on the atoms in the 
unit cell, and summation over J extends over all lattice sites. We note that, in the context 
of periodic problems, the above expressions assume a single k-point (F— point) sampling. 
The computation of the electron density and kinetic energy in ^ and ^ for multiple 
k-point sampling involves an additional quadrature over the k-points in the Brillouin zone 

(cf. e.g mi). 

The electrostatic interaction terms as expressed in equations ([7])- ([9]) are nonlocal in 
real-space, and, for this reason, evaluation of electrostatic energy is the computationally 
expensive part of the calculation. Following the approach in |48l I24j . the electrostatic 
interaction energy can be reformulated as a local variational problem in electrostatic 
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potential by observing that |^ is the Green's function of the Laplace operator. To this end, 

M 

we represent the nuclear charge distribution by 6(r, R) = — J2 ^i^Rii^)^ where Zj6Rj{r) 

1=1 

represents a bounded smooth charge distribution centered at Ri, either corresponding 
to a pseudopotential, or, in the case of all-electron calculations, a regularization of the 
point charge having a support in a small ball around Rj with charge Zj. The nuclear 
repulsion energy can subsequently be represented as 



^ ^ f f b(r,R)b(r',R) , , , 

EUR) = -^j j ' dvdv'. 



(10) 



We remark that, while this differs from the expression in equation Q by the self-energy of 
the nuclei, the self-energy is an inconsequential constant depending only on the nuclear 
charge distribution, and is explicitly evaluated and subtracted from the total energy 
in numerical computations (cf. Appendix C in |31]). Subsequently, the electrostatic 
interaction energy, up to a constant self-energy, is given by the following variational 
problem: 

1 f f p{r)p{r') ^ ^, f ^ ^ 1 /■ f K^,R)Kr',R) ^ 

2] j nV^V J ^WK..t(r)dr+- j j |/_^.| dvdv' 

= -mf |i- j \V^{v)\^dv- J\p{v) + b{v,Rmv)dvY (11) 

where (j){v) denotes the trial function for the total electrostatic potential due to the 
electron density and the nuclear charge distribution and 3^ is a suitable function space 
discussed subsequently. 

Using the local reformulation of electrostatic interactions, the Kohn-Sham energy 
functional ^ can be rewritten as 

E{^, R) = sup L((/., R) , (12) 

where 

LicP,^,R) = Ts{^) + EM-j- J \VcP{v)\Uv + J\p{v) + b{v,Rmv)dv . (13) 

Subsequently, the problem of determining the ground-state energy and electron density 
for given positions of nuclei can be expressed as the following variational problem: 

inf E(^,R) , (14) 

where X = | '0i)xx{a,/3} = ^ij} with ( , )xx{a,/3} denoting the inner product 
defined on X x {a,/3}. X denotes a suitable function space that guarantees the existence 
of minimizers. We note that bounded domains are used in numerical computations, 
which in non-periodic calculations corresponds to a large enough domain containing the 
compact support of the wavefunctions and in periodic calculations correspond to the 
supercell. We denote such an appropriate bounded domain, subsequently, by 17. For 
formulations on bounded domains, X = 3^ = i?g(r2) in the case of non-periodic problems 



8 



Motamarri, Nowak, Leiter, Knap, & Gavini 



and X = y = Hp^^{^l) in the case of periodic problems are appropriate function spaces 
which guarantee existence of solutions (cf. e.g. [M])- Mathematical analysis of the Kohn- 
Sham DFT problem proving the existence of solutions in the more general case of M.^ 
(X = ii"^(M^)) has recently been reported [49] . 



2.2 Kohn-Sham eigenvalue problem 

The stationarity condition corresponding to the Kohn-Sham variational problem is equiv- 
alent to the non-linear Kohn-Sham eigenvalue problem given by: 

Tiipi = eiipi, (15) 

where 

71= (^-^V^ + V,s{p,R)^ (16) 

is a Hermitian operator with eigenvalues e^, and the corresponding orthonormal eigen- 
functions ipi for i = 1,2, ■ ■ ■ ,N denote the canonical wavefunctions. The electron density 
in terms of the canonical wavefunctions is given by 

N 

p(r) = ^|^,(x)p, (17) 
1=1 

and the effective single-electron potential, Vcf[{p,R), in (fT6]l is given by 

V,s{p, R) = VextiR) + Vh{p) + VUp) = VextiR) + ^ + ^ . (18) 

op dp 

As discussed in Section 12.11 it is efficient to compute the total electrostatic potential, 
defined as the sum of the external potential (VextiR)) and the Hartree potential (Vh-(p)), 
through the solution of a Poisson equation 

-^Vmr,R)=p{r) + b{r,R) , 

which is given by 

<A(r, R) ^ Vh{p) + VUr, R) = I dv' + / dv' . (19) 

J \v — v'\ J \v — r'l 

Finally, the system of equations corresponding to the Kohn-Sham eigenvalue problem are 
given by 

-^V + V,s{p, R)^ V* = ei^P^, (20a) 

N 

piv) = Y,m^)\', (20b) 

i=l 

-^V^(l){v,R)=p{v) + b{v,R); V,s{p,R) = ^{v,R) + ^-^ , (20c) 

which have to be solved with appropriate boundary conditions based on the problem 
under consideration. The formulation in ()20p represents a nonlinear eigenvalue problem 
which has to be solved self-consistently, and is subsequently discussed in Section HI Next 
we turn to the discrete formulation of the above Kohn-Sham eigenvalue problem. 
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2.3 Discrete Kohn-Sham eigenvalue problem 

If Xfi represents the finite-dimensional subspace with dimension Uh, the finite-element 
approximation of the various field variables (spatial part of the wavefunctions and the 
electrostatic potential) in the Kohn-Sham eigenvalue problem (|20p are given by 

V^f(r)= J^iV,-(r)^^' , (21) 
i=i 

■n-h 

<p\r)=Y,Njir)<P^ , (22) 
i=i 

where Nj : 1 < j < nfi denote the basis of Xfi. Subsequently, the discrete eigenvalue 
problem corresponding to (|20p is given by 

U^i = e^M^i , (23) 

where Hjk denotes the discrete Hamiltonian matrix, Mjk denotes the overlap matrix (or 
commonly referred to as the mass matrix in finite-element literature), and denotes i^^ 
eigenvalue corresponding to the eigenvector S^j. The expression for the discrete Hamil- 
tonian matrix ILjk for a non-periodic problem with X = 3^ = Hq{Q) as well as a periodic 
problem on a supercell with X = y = Hp^j.{Q) is given by 

Hjk = \ I ViVj(r) .ViVfc(r) dr + I K^f(r, R)Nj{v)Nk{r) dr . (24) 

We refer to [2Uj for the expression of Hj^ in the case of a periodic problem on a unit-cell 
using the Bloch ansatz. The discrete Kohn-Sham eigenvalue problem (|23p is a generalized 
eigenvalue problem with an overlap matrix Mj^ = Nj{r)Ni^{r) dr, which results from 
the non-orthogonality of the finite-element basis functions. However, the generalized 
eigenvalue problem (|23p can be transformed into a standard Hermitian eigenvalue problem 
as follows. Since the matrix M is positive definite symmetric, there exists a unique 
positive definite symmetric square root of M, and is denoted by M^/^. Hence, the 
following holds true 

H*', = e,''M^/2]y[i/2^. 

UW, = e^W^ (25) 

where 

H = M~^/2jjM~^/2 

We note that H is a Hermitian matrix, and (I25p represents a standard Hermitian eigen- 
value problem. The actual eigenvectors are recovered by the transformation = 
]V[-i/2»^^_ We remark that is a vector containing the expansion coefficients of the 
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eigenfunction ^/'f (r) expressed in an orthonormal basis spanning the finite-element space. 
Furthermore, we note that the transformation to a standard eigenvalue problem (j25p is 
computationally advantageous only if the matrix M~^/^ can be evaluated with modest 
computational cost. This is readily possible by using spectral finite-elements rather than 
conventional finite-elements, and is discussed in detail in Section [H 

The convergence of finite-element approximation for the Kohn-Sham DFT model was 
shown in [2l] using the notion of T— convergence. We also refer to the recent numerical 
analysis carried out on finite dimensional discretization of Kohn-Sham models [37], which 
also provides the rates of convergence of the approximation. Next, we derive the optimal 
coarse-graining rates for the finite-element meshes using the solution fields in the Kohn- 
Sham DFT problem. 

3 A-priori mesh adaption 

We propose an a priori mesh adaption scheme in the spirit of |33^ [32] by minimizing the 
error involved in the finite-element approximation of the Kohn-Sham DFT problem for 
a fixed number of elements in the mesh. The proposed approach closely follows the a 
priori mesh adaption scheme developed in the context of orbital-free DFT (31]. In what 
follows, we first derive a formal bound on the energy error \E — Eh\ as a function of the 
characteristic mesh-size h, and the distribution of electronic fields (wavefunctions and 
electrostatic potential). We note that, in a recent study, error estimates for a generic 
finite dimensional approximation of the Kohn-Sham model have been derived [37]. How- 
ever, the forms of these estimates are not useful for developing mesh-adaption schemes 
as the study primarily focused on proving the convergence of the finite-dimensional ap- 
proximation and determining the convergence rates. We first present the derivation of 
an error bound in terms of the canonical wavefunctions and the electrostatic potential, 
and subsequently develop an a priori mesh adaption scheme based on this error bound. 

3.1 Estimate of energy error 

In the present section and those to follow, we demonstrate our ideas on a system con- 
sisting of 2N electrons for the sake of simplicity and notational clarity. Let ' = 
, V^2^ • • • A\e^ = {e? , • • • e%}) and (* = {V^i , V^s • • • ^ ,e = {e, ,€2 ■ ■ ■ 
represent the solutions (spatial part of canonical wavefunctions, electrostatic potential, 
eigenvalues) of the discrete finite-element problem (I23p and the continuous problem ()20p 
respectively. In the following derivation and henceforth in this article, we consider all 
wavefunctions to be real-valued and orthonormal. We note that it is always possible 
to construct real-valued orthonormal wavefunctions for both non-periodic problems as 
well as periodic problems on the supercell. The wavefunctions are complex-valued for 
periodic problems on a unit-cell (with multiple k-points using the Bloch ansatz), and the 
following approach is still valid, but results in more elaborate expressions for the error 
bounds. Using the local reformulation of electrostatic interactions in the Kohn-Sham 
energy functional (equations (fT2|) -(fT3 |) ). the ground-state energy in the discrete and the 
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continuous problem can be expressed as: 



N 

2 5^ / ^iV^If dr+Z F(p(*"))(ir-i- / |V<A'f dr+Z" + 6)0^ dr , 

^ -j^ i/ if if? if if 



(26) 



(27) 



where 



i^(/5) = e^c{p)p ■ 



Proposition 3.1. In the neighborhood of (^,(j),€), the finite-element approximation 
error in the ground-state energy can be bounded as follows: 



N 

Eh-E\<2y2\l: [ |V#i|'f^r+ / {6i'if dr + / F'(p(* 
1=1 -^^ -^^ -^^ 



0)(<5V'.)'t^r 



+ 



+ 



n 



+ 2 



n 



(28) 



Proof. We first expand Eh{'^^,(j)^) about the solution of the continuous problem, i.e 
= * + J* and (j)'''' = + 6(j), and we get 

Ehi^ + 5^,^ + 6(1)) = 2^2 [ J|V(Vi* + 5Vi)P*+ / F{p{^ + 5^)) dr 

~l J a ^ J n 

f |V((^ + <5(/.)pdr+ [ (p{^ + 5^) + b) + 5(1)) dr , 
8vr Jo 

(29) 

which can then be simplified, using the Taylor series expansion, to 

N 

^;,(*^^'^) = 2V / h\V^iJ,\^ + \V6^Pi\^ + 2V^^}i■V5i,i)dr+ [ F{p{^))dr 

~l J SI ^ JQ 

N / N \ N 

+ 4V/ F'(p(*))^, dr + 8 / F"(p(*)) ( I dr + 2V / F'(p(*))(5^,)' 

N 

[ (|V0|2 + |V5(^p + 2V(^-V<5</.) dr+ /(/)(*) + 6)^ (ir + 4V [ 5iPi ^ dr 
Jq Jq i^J^ 

N N 

+ / {p{^) + b)6(t) dr + 2V / (#i)^<^ dr + 4V / ^Pi 6iPi dcj) dr + 0{6ipf , 6ipf6(P) . 

JQ J Q -^^ J Q 



(30) 
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We note that {'^,(j),e) satisfy the following Euler-Lagrange equations for each i = 
1,...,N. 

I I Vi)i-V5il)idv+ / F'{p{^))i)i5i^idv+ / i^i 5i^i 4) dr = li / i^idi^idv, (31a) 

^ J Q J Q J Q J Q 



-y- / V'/' • dr+ I + b)5(t) dr = . 

4vr Jn J a 



(31b) 



Using ([30]) and the Euler-Lagrange equations (|3T]) . we get 



Eh-E = 2y^l 



-iVJVil' + 2 e, JV'* + F'(p(*))(5^i)' 



dr + 8 ^ i^"(p(*)) * 



TV 



Tpi 5ipi 5(f) dr 



The orthonormality constraint functional in the discrete form is given by 

c(*'^)= / ^^i;^dr-6,, , 
J a 

and upon expanding about the solution 'I', we get 

c(*'') = / (V^i + SiJi){^Pj + Sil:j) dr - 5ij 
Jn 

= / [A'ipj + Si^ii^j + Sipjipi + Sipidil^j] dr - 6ij . 
Jn 

Using 

/ dr = 6ij , 
Jn 



and c(^ ) = in (j35]) . we get for i = i 



2 / ipidtl^idr 
'n 



n 



{d^Pifdr i = l,2,...,iV 



(32) 
(33) 

(34) 
(35) 

(36) 
(37) 



Using equations ([32]) and ()37p . we arrive at the following error bound in energy 



N 

\Eh-E\<2y\\ I \V5^,\^dr+ li [ {8^,fdr + [ F'{p{^ 
i=i J^ J^ J^ 



'W^ifdr 



+ 



+ 



n 



{Si^^f4> dr 



+ 2 



■ipi 5ipi 5(f) dr 



n 



1 



+ ^ \^^^\ dr 



J^F"{p{^))(^^,6^}j dr 



□ 
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Proposition 3.2. The finite-element approximation error in proposition \3.1\ expressed in 
terms of the approximation errors in electronic wave-functions and electrostatic potential 
is given by 



\Eu- 



E\<C\y2 II - ^^ Win +\^ - 4>%n + E II - 

\ i i 



'^-'ANli.n (38) 



Proof. We use the following norms: | • represents the semi-norm in space, || • 
denotes the norm, || • ||o,r2 and || • ||o,p,f2 denote the standard Lp' and LP norm 
respectively. All the constants to appear in the following estimates are positive and 
bounded. Firstly, we note that 

J^i/ |V5V^,|2dr<Ci5]|^,-^f|?,f,, (39) 

i ^ i 

J2 m I dv = Y, n I - v^f drKc^Y, w i>i - v^f iio,f^ • (40) 

r i i 

Using Cauchy-Schwartz and Sobolev inequalities, we arrive at the following estimate 



E 



F'{p{^)){5^ifdv 



dv 



T.h\2 



i 

= CsY,\\ F'iPm \\o,n\\ V^. - V^f ||g,4, 

i 

<C3J2\\^P^-i^1\\ln ■ 



Further, we note 



5vr Jn 



|V( 



lh\l2 



dr < C4I 



T/i|2 
^' \l,Q 



(41) 



(42) 



Using Cauchy-Schwartz and Sobolev inequalities we arrive at 



i 



|2 



(43) 
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Also, we note that 



E 



^pi Sipi 5(p dr 



n 



dr 



< X] II llo.6,n|| i>i - i>i \\o,n\\ (t)-4>^ 

i 



|o,3,n 



(44) 



where we made use of the generahzed Holder inequality in the first step and Sobolev 
inequality in the next. Finally, we use Cauchy-Schwartz inequality to arrive at 



ipi6iJi dr 



< ljF"{pm\ (^El^^r) (^''^^^'') * ^^^^ 

= E / \F"{pmpi^)mf\ dr 

<C7Y1 II 

i 

Using the bounds derived above, it follows that 



l,h ||2 



(46) 
(47) 



\ i i 



\o,nH-^^ \\i,n (48) 



□ 



We now bound the finite-element discretization error with interpolation errors, which 
in turn can be bounded with the finite-element mesh size h. This requires a careful 
analysis in the case of Kohn-Sham DFT and has been discussed in [S^. Using the results 
from the proof of Theorem 4.3 in [37], we bound the estimates in equation (j48p using the 
following inequalities (cf. [50] ) 



-tpi\i,n < C'olV'j -tpi\i,n < Co^h'^\ipi\k+i,n^ , 

e 

i^i-i'i \\o,n<Ci II ipi-Tpi \\o,n< Ci^h''^+^\iPi\k+i,n^ , 

e 

e 

U-4>'' \\o,n< C2U- \\o,n< C3 h'!+'\^\k+i,n. , 



(49a) 
(49b) 
(49c) 
(49d) 



where k is the order of the polynomial interpolation, and e denotes an element in the 
regular family of finite-elements [50] with mesh-size hg covering a domain Qg- Using the 
above estimates, the error estimate to 0{h'^^~^^) is given by 



\Eh-E\<cY,'^: 



2k 



Ei^^i 



(50) 
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3.2 Optimal coarse-graining rate 

Following the approach in [33], we seek to determine the optimal mesh-size distribution 
by minimizing the approximation error in energy for a fixed number of elements. Using 
the definition of the semi-norms, we rewrite equation (|50p as 



e=l "^^s i 



dr 



(51) 



where Ng denotes the total number of elements in the finite-element triangulation, and 
£)k+i denotes the {k + 1)*'^ derivative of any function. An element size distribution 
function h{r) is introduced so that the target element size is defined at all points r in fi, 
and we get 



6=1'^^'= i 

<C' [ h^\v) \D^^Hi{T)\^ + |Z)'^+V(r)P 
Jn ■ 

Further, the number of elements in the mesh is in the order of 



dr 



dr . 



(52) 
(53) 



n 



(54) 



The optimal mesh-size distribution is then determined by the following variational prob- 
lem which minimizes the approximation error in energy subject to a fixed number of 
elements: 

mm J {/i2'=(r)[^|Z)^+i'0i(r)|2 + |D^+i(^(r)|2]|cir subject to : J = TV^ . (55) 



The Euler-Lagrange equation associated with the above problem is given by 



2kh^^-^{r) |L»'=+Vi(r)|2 + \D 



^+^^fr)P 



3ry 



(56) 



where rj is the Lagrange multiplier associated with the constraint. Thus, we obtain the 
following distribution 



h{r) = |Z)'=+i^i(r)|2 + \D^+mr)\'') 



-l/(2fc+3) 



(57) 



where the constant A is computed from the constraint that the total number of elements 
in the finite-element discretization is N^,. 

The coarse-graining rate derived in equation (j57p has been employed to construct the 
finite-element meshes by using the a priori knowledge of the asymptotic solutions of (r) 
and (j){r) for different kinds of problems we study in the subsequent sections. 
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4 Numerical implementation 

We now turn to the numerical implementation of the discrete formulation of the Kohn- 
Sham eigenvalue problem described in Section [2j We first discuss the higher-order finite- 
elements used in our study with specific focus on spectral finite-elements, which are 
important in developing an efficient numerical solution procedure. 

4.1 Higher-order finite-element discretizations 

Linear finite-element basis has been extensively employed for a wide variety of appli- 
cations in engineering involving complex geometries and moderate levels of accuracy. 
On the other hand, much higher levels of accuracy (chemical accuracy) is desired in 
electronic structure computations of materials properties. To achieve the desired chem- 
ical accuracy, a linear finite-element basis is computationally inefficient since it requires 
a large number of basis functions per atom |29^ I22j . Hence, we investigate if higher- 
order finite-element basis functions can possibly be used to efficiently achieve the desired 
chemical accuracy. To this end, we employ in our study basis functions comprising of 
linear tetrahedral element (TET4) and spectral hexahedral elements up to degree eight 
(HEX27, HEX125SPECT, HEX343SPECT, HEX729SPECT). The numbers following 
the words 'TET' and 'HEX' denote the number of nodes in the element, and the suffix 
'SPECT' denotes that the element is a spectral finite-element. We note that spectral 
finite-elements [51^ [52] have been employed in a previous work in electronic structure cal- 
culations [30], but the computational efficiency afforded by these elements has not been 
thoroughly studied. We first briefly discuss spectral finite-elements (also referred to as 
spectral-elements) employed in the present work and the role they play in improving the 
computational efficiency of the Kohn-Sham DFT eigenvalue problem. 

The spectral-element basis functions employed in the present work are constructed 
as Lagrange polynomials interpolated through an optimal distribution of nodes corre- 
sponding to the roots of derivatives of Legendre polynomials, unlike conventional finite- 
elements which use equispaced nodes in an element. Such a distribution does not have 
nodes on the boundaries of an element, and hence it is common to append nodes on the 
element boundaries which guarantees basis functions. These set of nodes are usually 
referred to as Gauss-Lobatto-Legendre points. Furthermore, we note that conventional 
finite-elements result in a poorly conditioned discretized problem for a high order of in- 
terpolation, whereas spectral-elements are devoid of this deficiency [52] . The improved 
conditioning of the spectral-element basis was observed to provide a 2-3 fold compu- 
tational advantage over conventional finite-elements in a recent benchmark study [31) 
conducted to assess the computational efficiency of higher-order elements in the solution 
of the orbital-free DFT problem. 

A significant advantage of the aforementioned spectral-elements is realized when we 
conjoin their use with specialized Gaussian quadrature rules that have quadrature points 
coincident with the nodes of the spectral-element, which in the present case corresponds 
to the Gauss-Lobatto-Legendre (GLL) quadrature rule [53]. Importantly, the use of such 
a quadrature rule will result in a diagonal overlap matrix (mass matrix) M. To elaborate, 
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consider the elemental mass matrix given by 

/ iV,(r)iV,(r)dr= /' [' Ni{tv,C)Nj{tv,C) det{Je) di dr^ dQ (58) 

riq 

= X] ^P,<l,rNi{^p,r]q,Cr)Nj{^p,r]q,Cr) det{Je) (59) 

p,q,T=0 

where (.^, 77, C) represents the barycentric coordinates, Jg represents the elemental jaco- 
bian matrix of an element Og, and Uq denotes the number of quadrature points in each 
dimension in a tensor product quadrature rule. Since the quadrature points are coinci- 
dent with nodal points, the above expression is non-zero only if i = j, thus resulting in 
a diagonal elemental mass matrix and subsequently a diagonal global mass matrix. A 
diagonal mass matrix makes the transformation of the generalized Kohn-Sham eigenvalue 
problem (j23|) to a symmetric standard eigenvalue problem (j25p trivial. As discussed and 
demonstrated subsequently, the transformation to a standard eigenvalue problem allows 
us to use efficient solution procedures to compute the eigenspace in the self-consistent 
field iteration. We note that, while the use of the GLL quadrature rule is important 
in efficiently transforming the generalized eigenvalue problem to a standard eigenvalue 
problem, this quadrature rule is less accurate in comparison to Gauss quadrature rules. 
An n point Gauss-Lobatto rule can integrate polynomials exactly up to degree 2n — 3, 
while an n point Gauss quadrature rule can integrate polynomials exactly up to degree 
2n — 1. Thus, in the present work, we use the GLL quadrature rule only in the evaluation 
of the overlap matrix, while using the more accurate Gauss quadrature rule to evaluate 
the discrete Hamiltonian matrix H. 



4.2 Self-consistent field iteration 

As noted in Section [2l the Kohn-Sham eigenvalue problem represents a nonlinear eigen- 
value problem and must be solved self-consistently to compute the ground-state electron 
density and energy. We use computationally efficient schemes to evaluate the occupied 
eigenspace of the Kohn-Sham Hamiltonian (discussed below) in conjunction with finite 
temperature Fermi-Dirac distribution and charge density mixing to develop an efficient 
and robust solution scheme for the self-consistent field iteration of Kohn-Sham problem. 

Algorithm [1] depicts the typical steps involved in the self-consistent field (SCF) iter- 
ation. An initial guess of the electron density field is used to start the computation. A 
reasonable choice of such an initial guess is the superposition of atomic charge densities, 
and is used in the present study. The input charge density (/o(^(r)) to a self-consistent 
iteration is used to compute the total electrostatic potential (j){r, R) by solving the follow- 
ing discrete Poisson equation using a preconditioned conjugate gradient method provided 
by PETSc [M] package: 

VNjir) .ViVfc(r) dr] = (pfn(r) + K^, R)) Nj{r) dr . (61) 



k=l 

Subsequently, the effective potential Ves is evaluated to set up the discrete Kohn-Sham 
eigenvalue problem ([23]). We now discuss the different strategies we have investigated to 
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Algorithm 1 Self Consistent Field Iteration 

1. Provide initial guess for electron density Po(r) on the finite-element mesh. This will be 
the input electron density for the first self-consistent iteration (p|]^(r) = pQ(r)). 

2. Compute the total electrostatic potential 0'*(r, R) = VH-(p(^(r)) + Vext{b{^r, R)) by solving 
the discrete Poisson equation. 

3. Compute the effective potential, Ves{Pin, R) = Vxc{Pin) + R) ■ 

4. Solve for the occupied subspace spanned by the eigenfunctions 'i/'f (r), i = 1,2 ■ ■ ■ N, corre- 
sponding to N (N > N/2) smallest eigenvalues of the Kohn-Sham eigenvalue problem fl23p . 

5. Calculate the fractional occupancy factors (/») using the Fermi-Dirac distribution (Sec- 
tion (1123)) 

6. Compute the new output charge densities p^^i- from the eigenfunctions: 



7. If ||Pout(r) — pfn(r)|| < tolerance, stop; Else, compute new p[^ using a mixing scheme 
(Section l4.2.3p and go to step 2. 



compute the occupied eigenspace of the Kohn-Sham Hamiltonian H, and their relative 
merits. 

4.2.1 Solver strategies for finding the occupied eigenspace 

We examined two different solution strategies to compute the occupied subspace: (i) ex- 
plicit computation of eigenvectors at every self-consistent field iteration; (ii) A Chebyshev 
filtering approach. 

4.2.1.1 Explicit computation of eigenvectors: We first discuss the methods 
examined in the present work that involve an explicit computation of eigenvectors at a 
given self-consistent iteration. We recall that the discrete Kohn-Sham eigenvalue problem 
is a generalized Hermitian eigenvalue problem (GHEP) (I23p . As mentioned previously, 
by using the GLL quadrature rules for the evaluation of the overlap matrix M, which 
results in a diagonal overlap matrix, the generalized eigenvalue problem can be trivially 
transformed into a standard Hermitian eigenvalue problem (SHEP). We have explored 
both approaches in the present work, i.e. (i) solving the generalized eigenvalue problem 
employing conventional Gauss quadrature rules; (ii) solving the transformed standard 
eigenvalue problem by using GLL quadrature rules in the computation of overlap matrix. 

We have employed the Jacobi-Davidson (JD) method [55] to solve the GHEP. The 
JD method falls into the category of iterative orthogonal projection methods where the 
matrix is orthogonally projected into a lower dimensional subspace and one seeks an 
approximate eigenpair of the original problem in the subspace. The basic idea in JD 
method is to arrive at better approximations to eigenpairs by a systematic expansion of 




(60) 
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the subspace realized by solving a "Jacobi-Davidson correction equation" that involves 
the solution of a linear system. The correction equation is solved only approximately, 
and this approximate solution is used for the expansion of the subspace. Though the 
JD method has significant advantages in computing the interior eigenvalues and closely 
spaced eigenvalues, we found the JD method to be computationally expensive for systems 
involving the computation of eigenvectors greater than 50, due to the increase in the 
number of times the correction equation is solved. 

On the other hand, we employed the Krylov-Schur (KS) method [56] for solving the 
SHEP. In practice, one could also use the JD method to solve the SHEP, but, as previ- 
ously mentioned, the JD method is expensive to solve systems involving few hundreds 
of electrons and beyond. The KS method can be viewed as an improvement over tra- 
ditional Krylov subspace methods such as Arnoldi and Lanczos methods \57\ I58| . The 
KS method is based on Krylov-Schur decomposition where the Hessenberg matrix has 
the Schur form. The key idea of the KS method is to iteratively construct the Krylov- 
subspace using Arnoldi iteration and subsequently filter the unwanted spectrum from 
the Krylov-Schur decomposition. This results in a robust restarting scheme with faster 
convergence in most cases. 

We now demonstrate the computational efficiency realized by solving the discrete 
Kohn-Sham eigenvalue problem as a transformed SHEP in comparison to GHEP. To this 
end, we consider an all-electron simulation of a graphene sheet containing 16 atoms with 
96 electrons {N = 96) and a pseudopotential simulation of 3 x 3 x 3 face-centered-cubic 
aluminum nano-cluster containing 172 atoms with 516 electrons (A^ = 516) as benchmark 
systems. The relative error in the ground-state energy for the finite-element mesh used 
in the case of graphene is around 1.2 x 10"'^ while it is around 3.6 x 10"^ in the case of 
aluminium cluster. The reference ground-state energy is obtained using the commercial 
code GAUSSIAN in the case of the all-electron simulation of the graphene system, while it 
is obtained using the convergence study presented in Section [5] for the aluminium cluster. 
Table 1 shows the computational time taken for the first SCF iteration in each of the 
above cases. All the times reported in the present work represent the total CPU times. 
The Jacobi-Davidson method for GHEP and Krylov-Schur method for SHEP provided 



Table 1: Comparison of Generalized vs Standard eigenvalue problems 



Element Type 


DOFs 


Problem Type 




Time (GHEP) 


Time (SHEP) 


HEX125SPECT 
HEX343SPECT 


1368801 
2808385 


graphene 
Al 3 X 3 X 3 cluster 


96 
516 


1786 CPU-hrs 
2084 CPU-hrs 


150 CPU-hrs 
80 CPU-hrs 



by SLEPc package [59] have been employed in the present study. It is interesting to note 
that a 10-fold speedup is realized by transforming the Kohn-Sham eigenvalue problem 
to a SHEP in the case of graphene, while a 25-fold speedup was obtained in the case of 
aluminium cluster. We note that a similar observation was recently reported in [26] where 
the GHEP was transformed to SHEP via the mass-lumping approximation. Further, other 
simulations conducted as part of the present study suggest that this speedup increases 
with increasing system size. 
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4.2.1.2 Chebyshev filtering: We now examine the alternate approach of Cheby- 
shev filtering proposed in [35j, which is designed to iteratively compute the occupied 
eigenspace at every SCF iteration. We note that the Chebyshev filtering approach is 
only valid for standard eigenvalue problems. To this end, we use the aforementioned 
approach to convert the GHEP to a SHEP by employing the GLL quadrature rules in 
computing the overlap matrix, and remark that the use of spectral elements in conjunc- 
tion with the GLL quadrature is crucial in using the Chebyshev filtering technique to 
solve the Kohn-Sham eigenvalue problem in a finite-element basis. The Chebyshev fil- 
tering approach is based on a subspace iteration technique, where an initial subspace is 
acted upon by a Chebyshev filter constructed from the Kohn-Sham Hamiltonian that 
transforms the subspace to the occupied eigenspace. 

In the present work, at any given SCF iteration, we begin with the initial subspace V 
formed from the eigenvectors of the previous SCF iteration. We note that, as is the case 
with all subspace iteration techniques, we choose the dimension of the subspace V, N, 
to be larger than the number of filled ground-state orbitals. Typically, we choose N ~ 
^ + 20. This is also necessary to employ the finite temperature Fermi-Dirac smearing, 
discussed in Section 14.2.21 to stabilize the SCF iterations in materials systems that have 
very small band-gaps or have degenerate states at the Fermi energy. As proposed in [35], 
the Chebyshev filter is constructed from a shifted and scaled Hamiltonian, H = C1H + C2, 
where H is the transformed Hamiltonian in the SHEP (cf. equation (j25|) ). The constants 
ci and C2 which correspond to the scaling and shifting are determined such that the 
unwanted eigen-spectrum is mapped into [—1, 1] and the wanted spectrum into (—00, — 1). 
In order to compute these constants, we need estimates of the upper bounds of the 
wanted and unwanted spectrums. The upper bound of the unwanted spectrum, which 
corresponds to the largest eigenvalue of H, can be obtained inexpensively by using a small 
number of iterations of the Lanczos algorithm. The upper bound of the wanted spectrum 
is chosen as largest Rayleigh quotient of H in the space V from the previous SCF iteration. 
Subsequently, the degree-m Chebyshev filter, pm(H.), which magnifies the spectrum of H 
in (—00, — 1) — the wanted eigen-spectrum of H — transforms the initial subspace V to the 
occupied eigenspace of H. The degree of the Chebyshev filter and the number of filters 
employed are chosen such that the obtained space is a close approximation of the occupied 
space, with the residuals in the eigenvalue problem below a prescribed tolerance. We 
note that the action of the Chebyshev filter on V can be performed recursively, similar 
to the recursive construction of the Chebyshev polynomials [60j . After obtaining the 
occupied eigenspace, we orthogonalize the basis functions, and subsequently project H 
into the eigenspace to compute the eigenvalues that are used in the Fermi-Dirac smearing 
discussed in the next subsection. 

We now compare the computational times taken for a single SCF iteration solved 
using an eigenvalue solver based on Krylov-Schur method and the Chebyshev filter using 
the aforementioned benchmark problems comprising of a 16-atom graphene sheet and 
172-atom aluminium cluster. We use a Chebyshev polynomial of degree 800 for the 
graphene all-electron calculation and a polynomial degree of 12 for aluminum cluster 
pseudopotential calculation respectively. We remark that the degree of the polynomial 
required for the Chebyshev filter depends on the largest eigenvalue of the Hamiltonian H, 
which controls the ratio of the wanted to the unwanted eigen-spectrum and determines the 
effectiveness of a Chebyshev filter. We note that the largest eigenvalue is in turn related 
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Table 2: Comparison of Standard eigenvalue problem vs Chebysliev filtered sub- 
space iteration (ChFSI) 



Element Type 


DOFs 


Problem Type 


N 


Time (SHEP) 


Time (ChFSI) 


HEX125SPECT 
HEX343SPECT 


1368801 
2808385 


graphene 
Al 3 X 3 X 3 cluster 


96 

512 


150 CPU-hrs 
80 CPU-hrs 


12.5 CPU-hrs 
13 CPU-hrs 



to the finite-element discretization, which increases with decrease in the element-size of 
the finite-element mesh. In general, all-electron calculations require locally refined meshes 
near the nuclei as they involve Coulomb-singular potential fields and highly oscillatory 
core wavefunctions. Hence, a very high degree of Chebyshev polynomial (of the order of 
10^ — 10'^) needs to be employed to effectively filter the unwanted spectrum. On the other 
hand, simulations performed on systems with smooth external pseudopotential require 
Chebyshev polynomial degrees between 10 to 30. As is evident from the results, we clearly 
see a factor of 12 speedup that is obtained in the case of graphene, and a factor of around 
6 speedup that is obtained in the case of aluminium cluster. The speedup obtained was 
even greater for larger materials systems. 

The use of spectral finite-elements in conjunction with Chebyshev filtered subspace 
iteration presents an efficient and robust approach to solve the Kohn-Sham problem using 
the finite-element basis. For all the subsequent simulations reported in the present work, 
we employ the Krylov-Schur method for the first SCF iteration and use the Chebyshev 
filtering approach for all subsequent iterations to compute the occupied eigenspace. 



4.2.2 Finite temperature smearing: Fermi-Dirac distribution 

For materials systems with very small band gaps or those with degenerate energy levels 
at the Fermi energy, the SCF iteration may exhibit charge sloshing — a phenomenon 
where large deviations in spatial charge distribution are observed between SCF iterations 
with different degenerate (or close to degenerate) levels being occupied in different SCF 
iterations. In such a scenario, the SCF exhibits convergence in the ground-state energy, 
but not in the spatial electron density. It is common in electronic structure calculations 
to introduce an orbital occupancy factor [3] based on the energy levels and a smearing 
function to remove charge sloshing in SCF iterations. A common choice for the smearing 
function is the finite temperature Fermi-Dirac distribution, and the orbital occupancy 
factor fi corresponding to an energy level ej is given by 

/, = /(6„ ep) = — — , (62) 

l + exp(^^) 

where the smearing factor a = ksT with ks denoting the Boltzmann constant and T 
denoting the temperature in Kelvin. In the above expression, ep denotes the Fermi 
energy, which is computed from the constraint on the total number of electrons given by 
2/j = N. We note that the convergence of ground-state energy is quadratic in the 
smearing parameter a [3]. 
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4.2.3 Mixing scheme: 

The convergence of the SCF iteration is crucially dependent on the mixing scheme, and 
many past works in the development of electronic structure methods have focussed on this 
aspect |6H [36l [62} [63l I64| . In the present work, we employ an n-stage Anderson mixing 
scheme [36], which is briefly described below for the sake of completeness. Let Pi^"\y) 
and Poy"' (r) represent the input and output electron densities of the n*^ self-consistent 
iteration. The input to the {n + 1)**^ self-consistent iteration, p[^'"^^'(r), is computed as 
follows 

An = 7mix Pout + (1 - 7mix) Pin (63) 

where 

n-1 

An(out) = '^ri Pin(out) + 2^ /'in(out) (64) 
k=l 

and the sum of all the constants q is equal to one, i.e., 

Cl + C2 + C3 H \- Cn = I ■ (65) 

Using the above constraint, equation (j64p can be written as 



n-1 

Pm(out) — Pin(out) + 2^ '^^ i^An(out) ~ /'in(out) J ■ VOOJ 
k=l 

Denoting F = p^y^ — p[^, the above equation can be written as 

n-1 

F = F(") + ^Cfc(F('^-'=)-F(")) . (67) 



k=l 



The unknown constants ci to c„_i are determined by minimizing R = \\F\\2 = Wp^^ ~ 
p'outlli' which amounts to solving the following system of (n — 1) linear equations given 
by: 



n-1 



_ F{n-m)^jpin) _ jp{n-k)^ ^ ^^(n) _ p{n-m) ^ p{n)^ m = 1 ■ ■ ■ U - 1 (68) 



k=l 



where the notation (F, G) stands for the L2 inner product between functions F{r) and 
G{r) and is given by 

(F,G) = j F(r)G(r)dr . (69) 



The value of the parameter ^ndx in equation (|63p is chosen to be 0.5 in the present work. 
All the integrals involved in the linear system (j68p are evaluated using Gauss quadrature 



rules, and the values of Pi^'^^^^^ (r) are stored as quadrature point values after every n^^ self 
consistent iteration. In all the simulations conducted in the present work, the Anderson 
mixing scheme is used with full history. 



23 



Motamarri, Nowak, Leiter, Knap, & Gavini 



5 Numerical results 
5.1 Rates of convergence 

We begin with the examination of convergence rates of the finite-element approximation 
using a sequence of meshes with decreasing mesh sizes for various polynomial orders of 
interpolation. The benchmark problems used in this study, include: (i) all-electron cal- 
culations performed on boron atom and methane molecule, which represent non-periodic 
problems with a Coulomb-singular nuclear potential; (ii) pseudopotential calculations 
performed on a barium cluster that represents a non-periodic problem with a smooth 
external potential, and a bulk calculation of face-centered-cubic (FCC) calcium crystal. 
In the case of all-electron calculations, the nuclear charges are treated as point charges 
on the nodes of the finite-element triangulation and the discretization provides a regular- 
ization for the electrostatic potential. We note that the self-energy of the nuclei in this 
case is mesh-dependent and diverges upon mesh refinement. Thus, the self energy is also 
computed on the same mesh that is used to compute the total electrostatic potential, 
which ensures that the divergent components of the variational problem on the right 
hand side of equation ()lip and the self energy exactly cancel owing to the linearity of the 
Poisson equation (cf. Appendix C in [31] for a detailed discussion). 

We conduct the convergence study by adopting the following procedure. Using the a 
priori knowledge of the asymptotic solutions of the atomic wavefunctions [S^, we deter- 
mine the coarsening rate from equation (j57[) which is used to construct the coarsest mesh. 
Though the computed coarsening rates use the far-field asymptotic solutions instead of 
the exact ground-state wavefunctions that are a priori unknown, the obtained meshes 
nevertheless provide a systematic way for the discretization of vacuum in non-periodic 
calculations as opposed to using an arbitrary coarse-graining rate or uniform discretiza- 
tion. In the case of periodic pseudopotential calculations, a finite-element discretization 
with a uniform mesh-size is used. A uniform subdivision of the initial coarse-mesh is 
carried out to generate a sequence of refined meshes, which represents a systematic re- 
finement of the finite-element approximation space. The ground-state energies from the 
discrete formulation. Eh, obtained from the sequence of meshes constructed using the 
HEX125SPECT element and containing Ne elements are used to fit an expression of the 
form 

1^0 - Eh\ = C(l/iVe)''^/' , (70) 

to determine the constants Eq, C and k. The obtained value of Eq, which represents 
the extrapolated continuum ground-state energy computed using the HEX125SPECT 
element, is used as the reference energy to compute the relative error |^^^| in the 
convergence study of various orders of finite-elements reported in subsequent subsections. 

5.1.1 All-electron calculations 

We first begin with all-electron calculations by studying two examples: (i) boron atom 
(ii) methane molecule. 

5.1.1.1 Boron atom: This is one of the simplest systems displaying the full com- 
plexity of an all-electron calculation. For the present case, we use a Chebyshev filter of 
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order 500 to compute the occupied eigenspace. As discussed in Section 14.2.21 we use a 
finite-temperature smearing to avoid instabihty in the SCF iteration due to charge slosh- 
ing from the degenerate states at the Fermi energy. A smearing factor a = 0.0003168 Ha 
(T=100K) is used in the present study. We first determine the mesh coarse-graining 
rate by noting that the asymptotic decay of atomic wavefunctions is exponential, and an 
upper bound to this decay under the Hartree-Fock approximation is given by [34] 



ijj{r) ~ exp 



^2 e r 



for 



oo . 



(71) 



where — e denotes the energy of the highest occupied atomic/molecular orbital. While 
the above estimate has been derived for the Hartree-Fock formulation, it nevertheless 
provides a good approximation to the asymptotic decay of wavefunctions computed using 
the Kohn-Sham formulation. We use the aforementioned estimate, though not optimal, 
for all the wavefunctions in the atomic system, and adopt this approach for all systems 
considered subsequently. Hence, in equation ([57|) . we consider ipi to be 



exp 



r 



where ^ = v 2 e . 



(72) 



The electrostatic potential governed by the Poisson equation with a total charge density 
being equal to the sum of 5-0^ (r) and —h5{r) is given by 



-5 exp (-2^0 + ^ 



(73) 



Using the above equations, the mesh coarse-graining rate from equation (|57p is given by 



h{r) = A 



2fe+5 



exp (-2 ^ r) + 25 exp (-4 ^ r) 



n=Q ^ ^ 



k + l\ 2",c"(fc-Hl-7i)! 



-I 2' 



^l/(2A;+3) 



the value of e 



~h 



(74) 

determined on a 



Since e in the above equation is unknown a priori, 
coarse mesh is used in the above equation to obtain h{r). We now perform the numerical 
convergence study with tetrahedral and hexahedral spectral elements up to eighth order 
using this coarse-graining rate, and the results are shown in figure [TJ The value of Eq 
computed from equation ([70]) is —24.34319127 Ha, which is used to compute the relative 
errors in the energies. We observe that all the elements studied show close to optimal 
rates of convergence, 0{h^^), where k is the degree of the polynomial. An interesting 
point to note is that, although the governing equations are non-linear in nature and 
the nuclear potential approaches a Coulomb-singular solution upon mesh refinement, 
optimal rates of convergence are obtained. Recent mathematical analysis [37] shows that 
the finite-element approximation for the Kohn-Sham DFT problem does provide optimal 
rates of convergence for pseudopotential calculations. To the best of our knowledge, 
mathematical analysis of higher-order finite-element approximations of the Kohn-Sham 
DFT problem with Coulomb-singular nuclear potentials is still an open problem. 

We note that, in the case of linear finite-elements, a large number of elements are 
required to even achieve modest relative errors. In fact, close to five million linear TET4 
elements are required for a single boron atom to obtain a relative error of 10-2, while 
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Figure 1: Convergence rates for the finite-element approximation of a single boron 
atom. 



relative errors up to 10"^ are achieved with just few hundreds of HEX125SPECT and 
HEX343SPECT elements, and even higher accuracies are achieved with a few thousands 
of these elements. 



5.1.1.2 Methane molecule: The next example we study is methane with a C-H 
bond length of 2.07846 a.u. and a C-H-C tetrahedral angle of 109.4712°. For the present 
case, we use a Chebyshev filter of order 500 to compute the occupied eigenspace, and a 
smearing factor a = 0.0003168 Ha (T=100K) for the Fermi-Dirac smearing. As in the 
case of boron atom, the finite-element mesh for this molecule is constructed to be locally 
refined around the atomic sites, while coarse-graining away. The mesh coarsening rate 
in the vacuum is determined numerically by employing the asymptotic solution of the 
far-field electronic fields, estimated as a superposition of single atom far-field asymptotic 
fields, in equation (I57p . To this end, asymptotic behavior of the atomic wavefunctions in 
carbon atom ('0*^(r)) is chosen to be 

where ^ = V2I , (75) 

where e (negative of the eigenvalue of the highest occupied eigenstate) is determined 
from a coarse mesh calculation of single carbon atom. The corresponding electrostatic 
potential is governed by the Poisson equation, with total charge density being equal to 



exp 
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the sum of 6|?/''^(r)p and — 65(r), and is given by 



-6exp(-2^r) ( C + - 



(76) 



In the case of hydrogen atom, the analytical solution is given by 



— exp 



and the corresponding electrostatic potential is given by 



exp(-2r) ( 1 + - 



(77) 



(78) 



We now perform the numerical convergence study with both tetrahedral and hexahedral 
elements with the meshes constructed as explained before. Figure [2] shows the convergence 
results for the various elements, and figure E] shows the isocontours of electron density for 
methane molecule. The value of Eq, the reference ground-state energy of the methane 
molecule, is found to be —40.12028478 Ha. As in the case of boron atom, we obtain close 
to optimal convergence rates, and significantly higher relative accuracies in ground-state 
energies are observed by using higher-order elements. 
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Figure 2: Convergence rates for the finite-element approximation of a methane 
molecule. 
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5.1.2 Pseudopotential calculations 

We now turn to pseudopotential calculations in multi-electron systems. A pseudopoten- 
tial constitutes the effective potential of the nucleus and core electrons experienced by 
the valence electrons. Pseudopotentials are constructed such that the wavefunctions of 
valence electrons outside the core and their corresponding eigenvalues are close to those 
computed using all-electron calculations. In the present work, we use the local evanes- 
cent core pseudopotential |65] as a model pseudopotential to demonstrate our ideas. This 
pseudopotential has the following form 

VL = -;| (^(1 - (1 + - Ae-y'^ , (79) 

where Z denotes the number of valence electrons and y = \r — Ri\/Rc. The core decay 
length Rc and q > are atom-dependent constants [65] • The constants /3 and A are 
evaluated by the following relations: 

^= 4(^2_i) - A = -a'-a^. (80) 

5.1.2.1 Barium cluster: The first pseudopotential calculation we present is a bar- 
ium 2x2x2 body-centered cubic (BCC) cluster with a lattice parameter of 9.5 a.u.. 
A Chebyshev filter of order 16 is employed to compute the occupied eigenspace, and a 
smearing factor a = 0.000634 Ha (T=200K) is used for the Fermi-Dirac smearing. The 
finite-element mesh for this molecule is constructed to be uniform in the cluster region 
where barium atoms are present, while coarse-graining away. The mesh coarsening rate 
in the vacuum is determined numerically by employing the asymptotic solution of the 
far-field electronic fields, estimated as a superposition of single atom far-field asymptotic 
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fields, in equation ()57p . To this end, asymptotic behavior of the atomic wavefunctions in 
barium atom (V'(r)) is chosen to be 



r 



where ^ = Vzl , 



(81) 



where e (negative of the eigenvalue of the highest occupied eigenstate) is estimated from 
a coarse mesh calculation. The corresponding electrostatic potential is determined by 
the Poisson equation, with total charge density being equal to the sum of 2'i/i^(r) and 
—2S{r), and is given by 



(r) = -2exp(-2^r) ( C + - 
1 r 



(82) 



The numerical convergence study is conducted with both tetrahedral and hexahedral el- 
ements, and Figure m shows the rates of convergence for the various elements considered 
that are close to optimal rates of convergence. The value of £"0, the energy per atom, 
is found to be —0.638599384 Ha. The main observation that distinguishes this study 
from the all-electron study is that all orders of interpolation provide much greater accu- 
racies for pseudopotential calculations in comparison to all-electron calculations. Linear 
basis functions are able to approximate the ground-state energies up to relative errors 
of 10~^, whereas relative errors of 10~^ can be achieved with higher-order elements with 
polynomial degrees of four and above. 
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Figure 4: Convergence rates for the finite-element approximation of 2 x 2 x 2 barium 
cluster. 
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Electron Density 
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Figure 5: Electron density contours of 2 x 2 x 2 barium cluster. 

5.1.2.2 Perfect crystal with periodic boundary conditions: The next ex- 
ample considered is that of a perfect calcium face-centered cubic (FCC) crystal with 
lattice constant 10.55 a.u.. A Bloch ansatz [17j is used in the simulation with 10 k-points 
(high symmetry) to sample the first Brillouin zone, which represents a quadrature rule of 
order 2 [66]. The eigenspace is computed using the Krylov-Schur method, and a smear- 
ing parameter of 0.003168 Ha (T=1000K) is used in these simulations. Figure [6] shows 
the rates of convergence for the various higher-order finite-elements considered in the 
present work. The value of Eq, the reference bulk energy per atom, is computed to be 
—0.729544738 Ha. We note that the results are qualitatively similar to the pseudopo- 
tential calculations on barium cluster. 

:o-\ . . 





Figure 6: Convergence rates for the finite-element approximation of bulk calcium. 
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5.2 Computational cost 

We now examine the key aspect of computational efficiency afforded by the use of higher- 
order finite-element approximations in the Kohn-Sham DFT problem. As seen from 
the results in Section [5. H higher-order finite-element discretizations provide significantly 
higher accuracies with far fewer elements in comparison to linear finite-elements. How- 
ever, the use of higher-order elements increases the per-element computational cost due 
to an increase in the number of nodes per element, which also results in an increase in the 
bandwidth of the Hamiltonian matrix. Further, higher-order elements require a higher- 
order accurate quadrature rule, which again increases the per-element computational 
cost. Thus, in order to unambiguously determine the computational efficiency afforded 
by higher-order finite-element discretizations, we measured the CPU-time taken for the 
simulations conducted on the aforementioned benchmark problems for a wide range of 
meshes providing different relative accuracies. All the simulations are conducted using 
meshes with the coarse-graining rates determined by the approach outlined in Section [3^21 
All the numerical simulations reported in this work are conducted using a parallel imple- 
mentation of the code based on MPI, and are executed on a parallel computing cluster 
with the following specifications: dual-socket six-core Intel Core 17 CPU nodes with 12 
total processors (cores) per node, 48 GB memory per node, and 40 Gbps Infiniband 
networking between all nodes for fast MPI communication. The various benchmark cal- 
culations were executed using 1 to 12 cores, while the results for the larger problems 
discussed subsequently were executed on 48 to 96 cores. It was verified (see Section [5^ 
that our implementation scales linearly on this parallel computing platform for the range 
of processors used, and hence the total CPU-times reported for the calculation are close 
to the wall-clock time on a single processor. The number of processors used to conduct 
ABINIT and GAUSSIAN simulations for the comparative studies, discussed subsequently, 
are carefully chosen to ensure scalability of these codes, and are typically less than 20 
cores. 

5.2.1 Benchmark systems 

We first consider the benchmark systems comprising of boron atom, methane molecule, 
barium cluster and bulk calcium crystal. The mesh coarsening rates for these benchmark 
systems derived in Section [5TT] are employed in the present study. The number of elements 
are varied to obtain finite-element approximations with varying accuracies that target 
relative energy errors in the range of IQ-^ - lO"'^. We employ the same numerical 
algorithms and algorithmic parameters — order of Chebyshev filter, finite-temperature 
smearing parameter — as discussed in Section 15.11 for the present study. The total CPU- 
time is measured for each of these simulations on the series of meshes constructed for 
varying finite-element interpolations and normalized with the longest time in the series 
of simulations for a given material system. The relative error in ground-state energy is 
then plotted against this normalized CPU-time. Figures 171 [8l [91 and [TOl show these results 
for boron, methane molecule, barium cluster and bulk calcium crystal, respectively. 

Our results show that the computational efficiency of higher-order interpolations im- 
proves as the desired accuracy of the computations increases, in particular for relative 
errors of the order of 10^^ and lower, which corresponds to chemical accuracy. We note 



31 



Motamarri, Nowak, Leiter, Knap, & Gavini 



that a thousand-fold computational advantage is obtained with higher-order elements 
over linear TET4 element even for modest accuracies corresponding to relative errors of 
10~^. For relative errors of 10"'^, quadratic HEX27 element performance is similar to 
other finite-elements with quartic interpolation and beyond, and sometimes marginally 
better. However, all higher-order elements significantly outperform linear TET4 ele- 
ment. Considering relative errors of 10-^ quartic HEX125SPECT element performs 
better in comparison to quadratic HEX27 element almost by a factor of 10, while hexic 
HEX343SPECT element is computationally more efficient than HEX125SPECT element 
by a factor greater than three and this factor improves further for lower relative errors. 
The octic HEX729SPECT element performs only marginally better than the hexic ele- 
ment for relative errors lower than 10~^. Comparing the results across different materials 
systems, we observe that the performance of lower-order elements is inferior in the case 
of all-electron systems in comparison to pseudopotential systems. For instance, at a rela- 
tive error of 10~^, the solution time using TET4 is more than three orders of magnitude 
larger than HEX343SPECT for the case of methane molecule. However, the solution time 
is three orders of magnitude larger for TET4 over HEX343SPECT for the case barium 
cluster at a relative error of 10"'^. 

In summary, for chemical accuracies corresponding to relative errors lower than 10~^, 
the computational efficiency improves significantly with the order of the element up to 
sixth-order, with diminishing returns beyond. Further, the relative performance of higher- 
order elements with respect to linear TET4 element in the case of all-electron calculations 
is significantly better in comparison to pseudopotential calculations. Lastly, qualitatively 
speaking, the sequence of graphs of relative error vs. normalized CPU-time for the various 
elements tend towards increasing accuracy and computational efficiency with increasing 
order of finite-element interpolation. However, we note that, for the systems studied, the 
point of diminishing returns in terms of computational efficiency of higher-order elements 
for relative errors commensurate with chemical accuracy is around sixth-order. 

5.2.2 Large materials systems 

In this section, we further investigate the computational efficiency afforded by higher- 
order finite-elements by considering larger material systems involving both pseudopoten- 
tial and all-electron calculations. As a part of this investigation, we demonstrate the 
effectiveness of the higher-order finite-elements by comparing the solution times of pseu- 
dopotential calculations against plane-wave basis set and solution times of all-electron 
calculations against a Gaussian basis set providing similar relative accuracy in the ground- 
state energy. The systems chosen as a part of this study are aluminium clusters containing 
3x3x3, 5x5x5, 7x7x7 FCC unit cells for the case of pseudopotential calculations, 
and a graphene sheet containing 100 atoms in the case of all-electron calculations. 

5.2.2.1 Pseudopotential calculations: The pseudopotential calculations on alu- 
minum clusters are conducted using the evanescent core pseudopotential [65]. All the 
simulations in these case studies use superposition of single-atom electron densities as 
the initial guess for the electron density in the first SCF iteration. We used the Krylov- 
Schur iteration for solving the eigenvalue problem in the first SCF iteration and used 
Chebyshev filtered subspace iteration for the subsequent SCF iterations. The order of 
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Figure 7: Computational efficiency of various orders of finite-element approxima- 
tions. Case study: boron atom. 
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Figure 8: Computational efficiency of various orders of finite-element approxima- 
tions. Case study: methane molecule. 
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Figure 9: Computational efficiency of various orders of finite-element approxima- 
tions. Case study: barium 2x2x2 BCC cluster. 
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Figure 10: Computational efficiency of various orders of finite-element approxima- 
tions. Case study: bulk calcium FCC crystal. 
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Chebyshev filters used for tlie 3x3x3, 5x5x5 and 7x7x7 aluminum clusters are 12, 30 
and 50 respectively. All simulations are conducted using a finite temperature Fermi-Dirac 
smearing parameter of 0.0003168 Ha (T=100K). In order to conduct a one-to-one com- 
parison, the plane-wave simulations are also conducted using the same pseudopotential 
and finite temperature Fermi-Dirac smearing used in the finite-element simulations. 



Aluminium 3x3x3 cluster: 

We first consider an aluminium cluster containing 3x3x3 FCC unit cells with a lat- 
tice spacing of 7.45 a.u.. The system comprises of 172 atoms with 516 electrons. The 
finite-element mesh for this calculation is chosen to be uniform in the cluster region con- 
taining aluminium atoms, while coarse-graining away. The mesh coarsening rate in the 
vacuum is determined numerically by employing the asymptotic solution of the far-field 
electronic fields, estimated as a superposition of single atom far-field asymptotic fields, 
in equation (j57p . To this end, the asymptotic behavior of atomic wavefunctions in an 
aluminium atom ('(/'(r)) is chosen to be 

[¥ f 1 r-~ 
V'(^) = Y — 6xp — ^ r where ^ = v3e, (83) 

where e (negative of the eigenvalue of the highest occupied eigenstate) is determined from 
a single aluminum atom coarse mesh calculation. The corresponding total electrostatic 
potential, governed by the Poisson equation with total charge density being equal to the 
sum of 3'0^(r) and — 3(5(r), is given by 

4>{r) = -3 exp (-2^ r) ( ^ + - ) . (84) 



Table 3: Convergence with finite-element basis for a 3 x 3 x 3 FCC aluminum cluster 
using HEX125SPECT element 



Degrees of freedom 


Energy per atom (eV) 


Relative error 


184, 145 


-54.1076597 


3.4 xlO-2 


1,453,089 


-56.0076146 


1.8 xlO-^ 


11,546,177 


-56.01788889 


1.3 xl0~6 



We obtain the converged value of the ground-state energy by following the procedure 
outlined in Section [5. 11 We use a sequence of increasingly refined HEX125SPECT finite- 
element meshes on a cubic simulation domain of side 400 a.u., and compute the ground- 
state energy for these meshes which are tabulated in Table [3j Using the extrapolation 
procedure discussed in Section [5T] (equation ([70]) ). we compute the reference ground-state 
energy (energy per atom) to be £"0 = —56.0179603 eV . The relative errors reported in 
Table [3] are with respect to this reference energy, and this reference energy will be used 
to compute the relative errors for all subsequent simulations for this material system. 
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Table 4: Comparison of higher-order finite-element (FE) basis with plane-wave 
basis for a 3 X 3 X 3 FCC aluminum cluster 



Type of basis set 


Relative error 


Time (CPU-hrs) 


Plane-wave basis (cut-off 
30 Ha, cell-size of 60 a.u.) 


3.3 xl0"6 


646 


FE basis (HEX343SPECT, 
2, 808, 385 nodes, domain 
size: 200 a.u.) 


3.6 xlO"*^ 


371 



In order to assess the performance of higher-order finite-elements on this material sys- 
tem, we conduct the finite-element simulation with a mesh containing HEX343SPECT 
elements and compare the computational CPU-time against a plane-wave basis code 
ABINIT [5] solved to a similar relative accuracy in the ground-state energy. The finite- 
element simulation has been performed on a cubic domain size of 200 a.u. with a mesh 
coarsening rate away from the cluster of atoms as determined using equations (j57p , (jSSh , ([84 
The resuhing mesh contains 12, 800 HEX343SPECT elements with 2, 808, 385 nodes. The 
plane-wave basis simulation has been performed by using a cell-size of 60 a.u. and a cut- 
off energy of 30 Ha with one k-point to obtain the ground-state energy of similar relative 
accuracy as the finite-element simulation. The solution times for the finite-element basis 
and the plane-wave basis are tabulated in Table Si These results demonstrate that the 
performance of higher-order finite-element discretization is comparable, in fact better by 
a two-fold factor, to the plane-wave basis for this material system. Figure [TT] shows the 
electron density contours on the mid-plane of the 3x3x3 aluminum cluster from the 
finite-element simulation. 
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Figure 11: Electron density contours of 3 x 3 x 3 FCC aluminium cluster. 
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Aluminium 5x5x5 cluster: 

We next consider an aluminium cluster containing 5x5x5 FCC unit cells with a lattice 
spacing of 7.45 a.u.. This material system comprises of 666 atoms with 1998 electrons. 
The finite-element mesh is constructed along similar lines as the 3x3x3 cluster, where a 
uniform mesh resolution is chosen in the cluster region containing aluminium atoms and 
coarse-graining away into the vacuum with a numerically determined coarsening rate as 
discussed earlier. As before, we first obtain the reference ground-state energy by using 
a sequence of increasingly refined HEX125SPECT finite-element meshes with a cubic 
simulation domain of side 800 a.u. and extrapolating the computed ground-state energies 
on these meshes (cf. Table [5]). The reference ground-state energy (energy per atom), 
thus determined, is Eq = -56.0495071 eV. 

Table 5: Convergence with finite-element basis for a 5 x 5 x 5 FCC cluster using 
HEX125SPECT element 



Degrees of freedom 


Energy per atom(eV) 


Relative error 


394, 169 


-54.8536312 


2.1 xl0~2 


3,124, 593 


-56.0425334 


1.2 xlO"^ 


24,883,937 


-56.0494500 


1.01 xlO~^ 



We now assess the performance of higher-order finite-elements on this material system 
in comparison to a plane-wave basis. The finite-element simulation in this case has been 
performed on a simulation domain size of 400 a.u. containing 36, 064 HEX343SPECT 
elements with 7, 875, 037 nodes. The plane-wave basis simulation conducted using the 
ABINIT package has been performed on a cell-size of 80 a.u. and a cut off energy of 30 Ha 
with one k-point to sample the Brillouin zone to obtain the ground-state energy of similar 
relative accuracy. The solution time for the finite-element basis and the plane-wave basis 
are tabulated in Table [U which shows that using higher-order finite-elements one can 
achieve similar computational efficiencies as afforded by a plane- wave basis, at least in 
the case of non-periodic calculations. Figure [12] shows the electron density contours on 
the mid-plane of the 5x5x5 FCC cluster from the finite-element simulation. 

Aluminium 7x7x7 cluster: 

As a final example in our case study of pseudopotential calculations, we study an alu- 
minium cluster containing 7x7x7 FCC unit cells with a lattice spacing of 7.45 a.u. This 
material system comprises of 1688 atoms with 5064 electrons. We only use the finite- 
element basis to simulate this system as the plane-wave basis calculation was beyond 
reach for this material system with the computational resources at our disposal. The 
finite-element simulation has been performed on a cubic simulation domain with a side of 
800 a.u.. The finite-element mesh was constructed as described in the simulation of other 
aluminum clusters, and comprised of 69, 984 HEX343SPECT elements with 15, 257, 197 
nodes. The computed energy per atom for this aluminum cluster is —56.06826762 eV, 
and figure [13] shows the electron density contours on the mid-plane of the cluster. 
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Table 6: Comparison of higher-order finite-element (FE) basis with plane- wave 
basis sets for a 5 x 5 x 5 FCC aluminum cluster 



Type of basis set 


Relative error 


Time (CPU-hrs) 


Plane-wave basis (cut-off 
30 Ha, cell-size of 80 a.u) 


2.1 xlO-5 


7307 


FE basis ( HEX343SPECT, 
7, 875, 037 nodes, domain 
size: 400 a.u.) 


7.9 xlO-6 


6619 



Electron Densi 

0.03623 
B- 0.02717 

(0.01812 
0.009058 
0.0000 

Figure 12: Electron density contours of 5 x 5 x 5 FCC aluminium cluster. 
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Figure 13: Electron density contours of 7 x 7 x 7 FCC aluminium cluster. 



5.2.2.2 All-electron calculations: We now demonstrate the performance of higher- 
order finite-element discretization in the case of all-electron calculations, by considering a 
graphene sheet as our benchmark problem. We simulate a graphene sheet containing 100 
atoms (600 electrons) with a C-C bond length of 2.683412 a.u.. We first obtain a con- 
verged value of the ground-state energy by conducting simulations using the GAUSSIAN 
package [H] using the polarization consistent DFT basis sets (pc-n), which have been 
demonstrated to provide a systematic convergence in Kohn-Sham DFT calculations [67] . 
Since these basis sets are not directly available in the GAUSSIAN package, we introduce 
them as an external basis set for conducting these simulations. The ground-state energy 
value obtained for triple-zeta pc-3 basis set is taken as the reference value (Eq) in this 
study, which is computed to be Eq = —37.7619141 Ha per atom. 

We assess the performance of higher-order finite-elements on this material system 
by comparing the computational CPU-time against the pc-2 basis set, which provides 
similar relative accuracy in the ground-state energy with respect to the Eq determined 
above. The finite-element mesh for this problem is chosen to be uniform in the region 
containing carbon atoms with local refinement around each atom while coarse-graining 
away into vacuum. The mesh coarsening rate in the vacuum is determined numerically 
by employing the asymptotic solution of the far-field electronic fields, estimated as a su- 
perposition of single atom far-field asymptotic fields, in equation (I57p . To this end, the 
asymptotic behavior of the atomic wavefunctions in carbon atom (t/^ir)) is chosen to be as 
in equation ([75]) . Since the GAUSSIAN package does not account for partial occupancy 
of energy levels, we suppress the Fermi-Dirac smearing in the finite-element simulations 
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for the present case in order to conduct a one-to-one comparison. Table [7] shows the 
relevant results of the simulation with figure [14] showing the electron density contours of 
the graphene sheet. We remark that the finite-element simulation with HEX125SPECT 
elements is ten-fold slower than the GAUSSIAN simulation with pc basis set. We note 
that the pc basis set is highly optimized for specific material systems, which is reflected 
in the far fewer basis functions required for these calculations. We believe this is the 
main reason for the superior performance of Gaussian basis, which, however, may not 
be transferable to generic material systems — for e. g. metallic systems. We also note 
that the computational time using finite-element basis functions can possibly be reduced 
significantly by enriching the finite-element shape functions with single atom wavefunc- 
tions using the partitions-of- unity approach [681 [69] . The degree of freedom advantage 
of the partitions-of-unity approach for Kohn-Sham calculations has been first demon- 
strated in [42], and presents a promising future direction for all-electron Kohn-Sham 
DFT calculations. 

Table 7: 100 atom graphene sheet (600 electrons) 



Type of basis set 


Relative error 


Time (CPU-hrs) 


pc2 (Gaussian, 3, 000 basis functions) 


1.06 xlO"^ 


666 


FE basis (HEX125SPECT, 8, 004, 003 nodes) 


1.2 xlO-^ 


7461 




5.3 Scalability of finite-element basis: 

The parallel scalability of our numerical implementation is demonstrated in Figure [151 
We study the strong scaling behavior by measuring the relative speedup with increasing 
number of processors on a fixed problem of constant size, which is chosen to be the 
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aluminum 3x3x3 cluster discretized with HEX125SPECT elements containing 3.91 
million degrees of freedom. The speedup is measured relative to the computational CPU- 
time taken on 2 processors, as a single processor run was beyond reach due to memory 
limitations. It is evident from the figure, that the scaling is almost linear. The relative 
speedup corresponding to 96-fold increase in the number of processors is 87.82, which 
translates into an efficiency of 91.4%. 
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Number of Processors 

Figure 15: Relative speedup as a function of the number of processors. 



6 Conclusions 

In the present study, we have analyzed numerically the higher-order adaptive finite- 
element discretization of the Kohn-Sham DFT problem. The present work is focussed 
towards demonstrating the significant computational efficiency in electronic structure cal- 
culations that is afforded by using an adaptive higher-order spectral finite-element dis- 
cretization in conjunction with appropriate solution strategies. We use the self-consistent 
field formulation of the Kohn-Sham DFT problem as our starting point. In order to aid 
our investigation, we first developed estimates for the discretization error in the ground- 
state energy in terms of the ground-state electronic fields (wavefunctions and electrostatic 
potential) and characteristic mesh-size. These error estimates and the a priori knowl- 
edge of the asymptotic solutions of far-field electronic fields were used to construct mesh 
coarsening rates for the various benchmark problems considered in this work. Since the 
finite-element discretization of the Kohn-Sham problem results in a generalized eigenvalue 
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problem, which is computationahy expensive to solve, we presented an approach to triv- 
ially transform this into a standard eigenvalue problem by using spectral finite-elements 
in conjunction with the Gauss-Lobatto quadrature rules that results in a diagonal overlap 
matrix. We subsequently examined two different strategies to solve the Kohn-Sham prob- 
lem: (i) explicit computation of eigenvectors at every self-consistent field iteration; (ii) a 
Chebyshev filtering approach that directly computes the occupied eigenspace. We found 
that the Chebyshev filtered approach presents with significant computational savings for 
both pseudopotential, as well as, all-electron calculations. 

Using the derived error estimates and the a priori knowledge of the asymptotic solu- 
tions of far-field electronic fields, we constructed close to optimal finite-element meshes 
for the various benchmark problems, which include all-electron calculations on systems 
comprising of boron atom, methane molecule and pseudopotential calculations on barium 
cluster and bulk calcium crystal. We employed the Chebyshev filtering approach on the 
transformed standard eigenvalue problem in our numerical investigations to study the 
computational efficiency of higher-order finite-element discretizations. To this end, we 
first investigated the performance of higher-order elements by studying the convergence 
rates of linear tetrahedral element and hexahedral spectral-elements up to sixth-order. 
In all the benchmark problems considered, we observed close to optimal rates of conver- 
gence for the finite-element approximation in the ground-state energy. Importantly, we 
note that optimal rates of convergence were obtained for all orders of finite-element ap- 
proximations, considered in this work, even for all-electron Kohn-Sham DFT calculations 
with Coulomb-singular potentials, the mathematical analysis of which, to the best of our 
knowledge, is an open question to date. 

We further investigated the computational efficiency afforded by the use of higher- 
order finite-elements up to eighth-order spectral-elements. To this end, we used the mesh 
coarsening rates determined from the proposed mesh adaption scheme and studied the 
CPU time required to solve the benchmark problems. Our results demonstrate that 
significant computational savings can be realized by using higher-order elements. For 
instance, a staggering 1000— fold savings in terms of CPU-time are realized by using 
sixth-order hexahedral spectral-element in comparison to linear tetrahedral element. We 
also note that the point of diminishing returns in terms of computational efficiency was 
determined to be around sixth-order for the benchmark systems we examined. To further 
assess the performance of higher-order finite-elements, we extended our investigations to 
study large materials systems and compared the computational CPU-time with commer- 
cially available plane-wave and Gaussian basis codes. We first conducted pseudopotential 
simulations on aluminium clusters containing 172 atoms and 666 atoms using sixth-order 
spectral-element in our implementation, as well as, the plane-wave basis in ABINIT 
package solved to the similar relative accuracy in the ground-state energy. These stud- 
ies showed that the computational CPU-time required for the finite-element simulations 
is lesser in comparison to plane-wave basis sets underscoring the fact that higher-order 
finite-elements can compete with plane-waves, at least in non-periodic settings, when 
employed in conjunction with efficient solution strategies. Furthermore, we were able 
to compute the electronic structure of an aluminium cluster containing 1 , 688 atoms by 
employing the sixth-order spectral-element, which was not possible using ABINIT due to 
large memory requirements. Next, we examined the computational efficiency in the case 
of all-electron calculations on a graphene sheet containing 100 atoms. The all-electron 
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calculations were conducted using polarization consistent Gaussian DFT basis sets and 
the fourth-order spectral-element basis, and we observed that the computational time for 
the finite-element basis was 10— fold greater than the Gaussian basis. 

The prospect of using higher-order spectral finite-elements as basis functions, in con- 
junction with the proposed solution strategies, for Kohn-Sham DFT electronic struc- 
ture calculations is indeed very promising. While finite-elements have the advantages of 
handling complex geometries and boundary conditions and exhibit good scalability on 
massively parallel computing platforms, their use has been limited in electronic structure 
calculations as their computational efficiency compared unfavorably to plane-wave and 
Gaussian basis functions. The present study shows that the use of higher-order discretiza- 
tions can alleviate this problem, and presents a useful direction for electronic structure 
calculations using finite-element discretization. Further, the computational cost in the 
case of all-electron calculations can be further reduced by enriching the finite-element 
shape functions with single-atom wavefunctions using the partitions-of-unity approach, 
and is currently being studied. Last, but not the least, the implications of using higher- 
order spectral finite-element approximations in the development of a linear scaling Kohn- 
Sham DFT formulation is a worthwhile subject for future investigation. 
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